Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

207 rollup compaction simplified #232

Merged

Conversation

alexeiakimov
Copy link
Contributor

Description

The present PR provides initial implementation for indexing and optimization based on the rollup compaction as it is proposed in the #207. The most important changes are

  1. A single index file can contain blocks from different cubes
  2. The format of the file metadata stored in the tags of AddFile has changed to support multiple blocks.
  3. Cube state is computed from the replicated flags of its blocks and id not persisted in the DeltaLog metadata anymore.
  4. The Qbeast metadata appended to the metadata of the Delta has changed to exclude the information about the replicated cubes.
  5. While indexing the cube blocks to be stored together in the same physical index file are grouped using the rollup compaction algorithm.
  6. While compacting the cube blocks to be stored together in the same physical file are grouped using the rollup compaction algorithm.

Type of change

This PR introduces incompatible changes in the metadata format for both individual index files and for the DeltaLog as well. The tables written with the previous version cannot be read by the present version, and vice versa. However the existing tables can be converted (int theory) to the new format by transforming the AddFile tags and DeltaLog metadata without touching the actual files.

How Has This Been Tested? (Optional)

The correctness tests are provided in the project code. Also @Jiaweihu08 did some extra performance test comparing the proposed version with the previous one and pure Delta.

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (f066acf) 92.00% compared to head (2d39821) 90.86%.
Report is 20 commits behind head on main-1.0.0.

Files Patch % Lines
.../main/scala/io/qbeast/spark/delta/IndexFiles.scala 94.20% 4 Missing ⚠️
...o/qbeast/spark/delta/writer/RollupDataWriter.scala 98.44% 2 Missing ⚠️
...re/src/main/scala/io/qbeast/core/model/Block.scala 94.44% 1 Missing ⚠️
src/main/scala/io/qbeast/spark/QbeastTable.scala 50.00% 1 Missing ⚠️
...ala/io/qbeast/spark/delta/writer/BlockWriter.scala 96.96% 1 Missing ⚠️
...in/scala/io/qbeast/spark/delta/writer/Rollup.scala 96.00% 1 Missing ⚠️
...la/io/qbeast/spark/index/query/QueryExecutor.scala 90.90% 1 Missing ⚠️
src/main/scala/io/qbeast/spark/utils/Params.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           main-1.0.0     #232      +/-   ##
==============================================
- Coverage       92.00%   90.86%   -1.15%     
==============================================
  Files              88       93       +5     
  Lines            2214     2386     +172     
  Branches          168      178      +10     
==============================================
+ Hits             2037     2168     +131     
- Misses            177      218      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@cugni cugni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of files changes, most of them small refactoring, but it is a bit hard to review like this. But the central class seems well written and the logic the one discussed. I only have 1 doubt about a test.

@osopardo1
Copy link
Member

We are finally merging the rollup!! Great job!! @alexeiakimov

@osopardo1 osopardo1 merged commit 6e222c2 into Qbeast-io:main-1.0.0 Nov 24, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants